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Abstract — This paper deals with improvements to the contrast 
source inversion method which is widely used in microwave 
tomography. First, the method is reviewed and wealtnesses 
of both the criterion form and the optimization strategy are 
underlined. Then, two new algorithms are proposed. Both of 
them are based on the same criterion, similar but more robust 
than the one used in contrast source inversion. The first technique 
keeps the main characteristics of the contrast source inversion 
optimization scheme but is based on a better exploitation of 
the conjugate gradient algorithm. The second technique is based 
on a preconditioned conjugate gradient algorithm and performs 
simultaneous updates of sets of unknowns that are normally 
processed sequentially. Both techniques are shovra to be more 
efficient than original contrast source inversion. 

Index Terms — Microwave Tomography, Non-linear Inversion, 
contrast source inversion. Reconstruction Methods 



I. Introduction 

The objective of microwave tomography is to reconstruct the 
permittivity and conductivity distributions of an object under 
test. This is performed from measurements of the field scat- 
tered by this object under various conditions of illumination. 
Microwave tomography has shown great potential in several 
application areas, notably biomedical imaging, non destructive 
testing and geoscience. 

Unhke other well-known imaging techniques (e.g. X-ray 
tomography), the involved wavelengths are long compared to 
the structural features of the object under test. Consequently, 
ray propagation approximation is not suitable. Instead an 
integral equation formulation, which is nonlinear and ill-posed 
[1], must be used. 

A large number of techniques have been proposed to per- 
form inversion {i.e., finding the permittivity and conductivity 
distributions from the measurements). In most of them inver- 
sion is formulated as an optimization problem. Differences 
between methods then come from the nature of the criterion 
and the type of minimization algorithm. 

Proposed criteria are made up of one, two or three terms 
with the possible addition of a regularization term. When only 
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one term is used, the quadratic error between actual and pre- 
dicted measurements is minimized. In the two- and three-term 
cases, criteria penalize both the error on the measurements and 
the error on some constraint inside the domain of interest. 

The problem being nonUnear, the criteria to be minimized 
may have local minima. To avoid being trapped in them, some 
methods use global minimization algorithms [2-6]. However, 
as the complexity of such algorithms grows very rapidly with 
the number of unknowns, most proposed methods use local 
optimization schemes and assume that no local minima will 
be encountered. 

Some of these methods are based on successive lineariza- 
tions of the problem using the Born approximation or some 
variants [7-15]. The capacity of these methods to provide 
accurate solutions to problems with large scatterers varies 
according to the used approximation. For instance, the dis- 
torted Bom method [16] (which is equivalent to the Newton- 
Kantorovich [14, 17] and Levenberg-Marquardt [12] methods) 
is more efficient for large scatterers than the Born iterative 
method [18] or some extensions like the one presented in [19]. 
Nevertheless, most of these methods require complete com- 
putation of the forward problem at each iteration which is 
computationally burdensome. 

To avoid the necessity for solving the forward problem at 
each iteration, two-term criterion techniques were proposed. 
The idea is to performed optimization not only with respect 
to electrical properties but also with respect to the total field in 
the region of interest. This is done with a so-called modified 
gradient technique in [20], a conjugate gradient algorithm in 
[21] and a quasi-Newton minimization procedure in [1]. 

The contrast source inversion (CSI) method [8,22-25] is 
also a 2-term criterion technique but this time the problem is 
formulated as a function of equivalent currents and electrical 
properties of the object under test (instead of the electric 
field and electrical properties). Optimization is performed by 
a conjugate gradient algorithm with an alternate update of the 
unknowns. 

We mentioned that, due to the nonlinearity of the problem, 
global optimization schemes could be a good choice. In [2-5], 
such algorithms are proposed for optimization of a two-term 
criterion. In [6] a simulated annealing algorithm is used to 
minimize a one -term criterion. 

In this paper, the emphasis is placed on well-known CSI 
methods which offer a good compromise between quality 
of the solution and computational effort. We address both 
questions of the form of the criterion and of the corresponding 
optimization algorithms. We first give background information 
on the CSI method; then we show that the corresponding 
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criterion exhibits unwanted characteristics and could lead to 
degenerate solutions. Two pitfalls of the optimization scheme 
are also underlined. 

Based on this analysis two new methods are proposed; both 
use the same criterion, the form of which is deduced from 
optimization and regularization theories. Therefore, these tech- 
niques only differ by their respective minimization strategies. 
The first one retains the main characteristics of the CSI method 
but is based on a better exploitation of the linear conjugate 
gradient algorithm. The second one is based on simultaneous 
updates of sets of unknowns and a preconditioned nonlinear 
conjugate gradient algorithm is used to perform the minimiza- 
tion. The two proposed methods exhibit improved robustness 
and convergence speed compared to CSI. 

Note that the scope of the study is limited to analysis of and 
improvements to techniques that retain the main characteristics 
of the original CSI methods. Further performance gains, either 
by significantly altering the nature of the objective function or 
by using approximate forms thereof, are investigated in [26] 
and [27]. 

II. Context 

In a microwave tomography experiment the objective is to 
find permittivity (e') and conductivity (cr) distributions of an 
object under test placed into a domain D and sequentially 
illuminated by M different microwave emitters. For each 
illumination, the scattered field is gathered at N points. A 
typical setup is depicted in Fig. 1. In this study, we limit 
ourselves to the 2-D TM case. This means we assume that 
all quantities are constant along the z direction and that the 
electric field is parallel to it. 




Fig. 1. Typical microwave tomography setxip 

By refering to the volume equivalence theorem and by 
using the method of moments as a discretization technique, 
the microwave tomography experiment can be described by 
the following system of equations (for more detail on the 
derivation we refer to [24]): 

For i = 1, . . . , M, 

Vi = GoWi + n° (la) 

Wi = X{Ef + G^Wi) (lb) 

where bold-italic fonts and bold-straight fonts denote column 
vectors and matrices, respectively, is a length N vector 
that contains the measured scattered field related to the ith 
illumination. Ef is the discretized incident field in D {i.e., 



the field when D is completely filled with the background 
medium). Its length is the number of discretization points, 
denoted by n. Gq and Gc are Green matrices of size N x n 
and nxn, respectively. X is a diagonal matrix such that X = 
diagja;}. x and Wi are vectors of length n and are called 
contrast vector and current vector, respectivley. Finally, n° is 
a noise vector that models all perturbations encountered in a 
microwave tomography experiment. 

Equations (la) and (lb) are called observation and coupling 
equations, respectively. The unknown quantities are x and 
W = {wi, . . . , wm) while x represents the actual quantity 
of interest, since it contains all relevant information about the 
permittivity and conductivity of the object under test. Indeed, 
X is related to the electrical characteristics as follow: 

x={e''- ei)let (2) 

where the division is performed term by term and where e'^ 
and are the discretized complex permittivity of the object 
under test and of background medium, respectively. Complex 
permittivity is expressed by 

e = e'-jc7/a; (3) 

where uj denotes the angular frequency. Estimating x is then 
equivalent to estimate e and a. 

It is possible to elinoinate Wi from (1). The following 
equation is then obtained: 

Vi = Go(I - XGc)-^X£;9 + < (4) 

where I is the identity matrix. Note that now, yi is a function 
of X only. The above expression highlights the nonlinearity of 
the problem which greatly complicates its resolution. 

Some approaches are based on (or equivalent to) solving (4) 
in the least-squares sense [12, 28]. Meanwhile, those methods 
necessitate a high calculation cost. In order to circumvent the 
difficulty, two term criterion methods are based on system 
(1) (or an equivalent system based on the total field and the 
contrast instead of the currents and the contrast [20, 24]) and 
minimize the sum of errors on both observation and coupling 
equations. Such an approach is not equivalent to solving (4), 
since the coupling equation is not fulfilled exactly. However, 
under most circumstances, the solutions are extremely similar. 
Therefore, the rest of the study is dedicated to this type of 
approach. 

III. Analysis of the CSI method 

In this section we present the background results on CSI 
and underUne some of weaknesses. Both the criterion form 
and the optimization scheme are studied. 

A. Background results on CSI 

CSI formulates the microwave tomography problem in 
the framework of optimization. The objective is to estimate 
(A,W) minimizing a given criterion F. This criterion makes 
a trade-off between the errors that respectively affect the 
observation (la) and coupling (lb) equations. A possible 
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Initialize x and W 
repeat 

for i = 1, M do 

Perform one iteration of the conjugate gradient algo- 
rithm to minimize F with respect to Wi 

end for 

Perform one iteration of the conjugate gradient algorithm 
to minimize F with respect to x 

until Convergence 



TABLE I 

Implementation of the CSI method according to [23] 



regularization term can be added to deal with the iU-posed 
nature of the problem [1]. More precisely, we have 



{x, W) = argmin F 

x,W 



F^Fi + \F2 + XrF, 
Fi = Y,\\yi-GoWi\\^ 

i 

F2 = J2 + G,Wi) - 

i 

Fr = Cl){x) 



Wi 



(5) 

(6a) 
(6b) 

(6c) 
(6d) 



where A and Ar respectively denote the weight and the 
regularization factor, ||-|| is the EucUdean norm, and ^ is a 
regularization function. 

Remark 1: In [23], the concept of multiphcative regu- 
larization was introduced. The idea is to multiply Fi + XF2 
by the penalty term F,-, instead of adding it. Here, we 
shall not pursue in this direction; we shall rather adopt the 
classical penalized least-square framework. 

CSI proposes the following expUcit expression of A as a 
function of x: 

|2 



A = Acsi = 



E^\\y^\^ 



(7) 



This choice is justified by the fact that Fi and AF2 are equal 
when the currents vanish. 

The optimization is performed according to a block- 
component scheme based on alternate updates of each set 
of unknowns. More precisely, minimization with respect to 
each Wi is performed for fixed values of x and Wj, j 7^ i; 
then, W is fixed at its current value and minimization is 
performed with respect to x. These updates form one iteration 
of the algorithm, and they are repeated until convergence. 
Each update of the CSI method consists of a single conjugate 
gradient step. Table I presents the algorithm. More details are 
given in [23]. 



B. Pitfalls of the CSI method 

It seems that both the criterion and the optimization algo- 
rithm proposed in the CSI method suffer from some weak- 
nesses. 

1) Criterion: The choice (7) for A seems to be inappro- 
priate essentially because the resulting criterion F reaches its 
minimum value for degenerate solutions if no regularization is 
used. More precisely, it is shown in Appendix A that solutions 
(x,W) exist such that 00 and Fx + Aqsi-Pj 0. 
This is obviously an undesirable feature, since it shows that 
any globally converging minimization method will produce a 
degenerate solution. As for more realistic, locally converging 
methods, they could either converge towards a local minimum, 
possibly near the expected solution, or towards a degenerate 
global minimum. In Section V, an example is shown where 
a local optimization algorithm converges toward a degenerate 
solution. 

From a theoretical standpoint, an additional regularization 
term could solve the problem, provided that F-c{x) 00 when 
1 1 a; 1 1 00. However, local (or even global) minima with large 
norms may still exist if Ar is not chosen large enough, while 
too large values of Ar will produce overregularized solutions. 

From a practical standpoint, CSI is widely used and, to 
our knowledge, convergence to degenerate solutions has not 
been reported yet. A reason could be that, in a widespread 
version of the method, the dependence of Aqsi on x is omitted 
in the calculation of the gradient component V^F. However, 
it is shown in the next section that this leads to undesirable 
properties of the optimization algorithm. 

Finally, it should be underlined that the choice (7) is not 
justified by solution quality or computation cost arguments. It 
could then be expected that better choices be possible from 
those points of view. 

2) Optimization scheme: The first problem with the CSI 
optimization scheme is to rely on conjugacy formulas in an 
unfounded way. Indeed, the conjugate gradient algorithm has 
been designed to minimize a criterion with respect to one set 
of unknowns [29]. The conjugacy of the descent directions can 
be defined only if this criterion remains the same during the 
optimization process. Meanwhile, in CSI, the criterion with 
respect to each set of unknowns change between two updates. 
For instance, between two updates of Wi at iterations k and 
fc + 1, the criterion F{wi) changes due to update of x at 
iteration k. The properties of the conjugate gradient algorithm 
then do not hold. 

The second weakness is that, in most cases (see, e.g., [23].), 
the algorithm does not use the exact expression of the gradient 
term V^F. For all methods based on (6a), we have 

Va^F = 2A ^(A| Aio; - Ajw,) + F2 V^^A + XrVa>^{x) 

i 

(8) 

where = diag{£!° + GcWi} and represents the 
transposed conjugate operation. In general, the term Va-A 
is neglected even though A is a function of x according 
to (7). The approximate gradient is also normalized by the 
ampUtude of the total field. This latter step is neglected in this 
paper not to interfere with our additive regularization term. In 
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practice, we have observed that these approximations prevent 
the convergence towards a degenerate minimizer. However, 
they also prevent the convergence towards a local minimizer, 
as illustrated in the Section V. Actually, the solution does not 
correspond to the stated goal of minimizing F. 

In the next section, new algorithms designed to avoid the 
previous pitfaUs are proposed. 

IV. Proposed algorithms 

Two new algorithms with improved performance with re- 
spect to the original CSI method are now introduced. The first 
one retains the main structure of CSI {i.e., block-component 
optimization) while compensating for its deficiencies. The 
second one is based on a simultaneous optimization scheme. 
Both algorithms use the same unique criterion based on (6a). 

We first detail the exact form of this criterion. To do so, 
the expected characteristics of the weight factor are deduced 
from the optimization theory and the role of the regularization 
term is analyzed. Then, we introduce our two optimization 
strategies. 

A. Weight factor 

The choice of parameter A is crucial since a bad choice 
may lead to an inappropriate solution or to a prohibitive 
computation time. In this subsection we deduce the expected 
characteristics of the weight factor from optimization theory 
and propose a strategy to set its value. The possible presence of 
an additional regularization term is deliberately omitted here 
since it does not interfere with the weight factor. Regulariza- 
tion issues will be discussed in Subsection IV-B. 

According to optimization theory [29], if A — > oo in (6a), 
solving (5) is equivalent to solving 

argmini^i under constraint (lb) (9) 

a!,W 

which amounts solving (1), or equivalently (4), in the least- 
squares sense. It is therefore quite natural to consider very 
large values of A in F. Unfortunately, optimization theory [29] 
also states that the minimization problem (5) becomes ill- 
conditioned for arbitrary large values of A. Practically, this 
implies that the computation time rises with A. 

According to these considerations, A should be set to a 
value large enough to approximately fulfill the constraint and 
small enough to preserve the conditioning of the optimization 
problem. This trade-off effect and its impact on both solution 
accuracy and computation time will be illustrated in the results 
section. 

To our knowledge, no unsupervised method is available to 
ensure an appropriate choice of A. Thus, we rather suggest 
to turn to a heuristic tuning. In our experiments, we have 
observed that an appropriate weight factor for a given contrast 
is still quite efficient for "similar" contrasts. We could then 
imagine that an appropriate A could be set during a training 
step, involving one or several known typical contrasts. 

We also note that the suggested Acsi presented in previous 
section also stems from heuristics since it is not justified 
by the solution quahty. Hand tuning can then be seen as 



a generalization of the existing suggestions that gives more 
flexibihty to the user, offering a trade-off between computation 
time and accuracy of the solution. 



B. Regularization 

In this section we analyze the necessity of using a regular- 
ization term in criterion F. 

The continuous-variable microwave tomography problem is 
intrinsically ill-posed [1]. After discretization, it yields an ill- 
conditioned problem [30] which, in practice, is very sensitive 
to noise: small perturbations on the measurements cause large 
variations on the solution. 

Regularization, introduced by Tikhonov [31], is a well 
known technique to overcome this difficulty. The objective 
is to restore the well-posed nature of the problem by in- 
corporating a priori information. According to Tikhonov's 
approach, this is done by adding a penalty term to the data- 
adequation component, weighted by a positive parameter Ar 
that modulates its relative importance. 

The simplest penalty term is the squared norm of the vector 
of unknowns. Here, we rather penahze the squared norm of 
first order differences of x between neighboring points in 
domain D, i.e., 

Fr = (j){x) = \\T>xf (10) 

where each row of D contains only two nonzero values, 1 and 
— 1, in order to properly implement the desired difference. 

In a number of existing methods, no penalization is incor- 
porated into the criterion. One could argue that this would 
lead to degenerate solutions due to the ill-conditioned nature 
of the problem. This is not necessarily true if the algorithm 
is stopped before convergence. Actually, it has been shown 
mathematically, in the simpler case of deconvolution-type 
inverse problems, that stopping a gradient-descent algorithm 
after K iterations has a similar effect than penalizing the 
criterion using F^ = with a weight factor Ar propor- 

tional to 1/i^ [32]. We will illustrate this equivalence for the 
microwave tomography case in the results section. 

In the sequel, we shall focus on the penalization approach 
since it appears as a more explicit way of performing regu- 
larization. From the previous analysis, we choose a criterion 
based on (6a) with (10) and with A and Ar constant and set 
heuristically. 



C. Optimization algorithms 

In this subsection we propose two schemes to minimize the 
criterion defined previously. The first one, Uke CSI, is based 
on block-component optimization. Meanwhile, it is designed 
to take full advantage of the conjugate gradient properties. 
Our second method is based on a simultaneous update of 
W and X. Unlike block-component algorithms, the simplest 
form of this procedure has the drawback of being sensitive 
to the relative scaling of unknowns. Therefore, we propose a 
technique capable of overcoming this scahng issue. 
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Initialize x = and W = W*^ 

repeat 

for i = 1, ...,M do 

Initialize Wi to 
repeat 

Perform one iteration of the linear conjugate gra- 
dient algorithm of Table IV to minimize F with 
respect to Wi according to (11) 

until Sufficient decrease of ||Vu,.F||^ 
wl ^ wl~^ + 9w{wi - w^~^) {Overrelaxation} 
end for 

k^O 

Initialize x to x^~^ 
repeat 

Perform one iteration of the linear conjugate gradient 
algorithm of Table IV to minimize F with respect to 
X according to (14) 
k^k + 1 
until Sufficient decrease of ||Va;F||^ 

^ x^~^ + O^ix — x^~^) {Overrelaxation} 

e^e + i 

until Convergence 



1) Alternated conjugate gradient method: According to our 
choices of A, Ar and F^., the criterion F is quadratic with 
respect to Wi when x and Wj, j ^ i, are held constant and is 
also quadratic with respect to x when W is held constant. 

Indeed, for all i, F admits the quadratic expression 

F{w,) = wjAwi - 2^{blwi) + Ci (11) 
as a function of Wi, where 

A = Gt Go + A(XGc - I)^ (XGc - I) (12) 

and 

6, = -Gly, + A(XGc - I^XE^ (13) 
and the quadratic expression 

F{x)^x^Clx-2^{b^x) + c (14) 
as a function of x, where 

Q = A^AjAi + A.DtD (15) 

i 

and 

6 = A^At«,, . (16) 

i 

Constants Cj and c are not defined here since they are not used 
in the following analysis. 

The linear conjugate gradient algorithm is a gradient-based 
minimization technique developed to solve linear systems, 
i.e., to minimize quadratic criteria. It has the advantage of 
producing computationally low-cost iterations while offering 
good convergence properties. It is then perfectly suited to 
minimize F with respect to each Wi and with respect to x 
(the complete form of the linear conjugate gradient algorithm 
for a complex-valued unknown vector is given in Appendix 
B, Table IV). 

We then propose an algorithm based on two nested iterative 
procedures. The main loop consists of alternated updates on 
each sets of unknowns. Each of these updates is performed by 
a hnear conjugate gradient algorithm. To truly benefit from 
the efficiency of the conjugate gradient method, we propose 
to perform several steps of the algorithm, the first of which not 
being conjugated with the last one of the previous iteration. 
This is the main difference with standard CSI method. 

We also use overrelaxation after each update. Convergence 
of the algorithm is still guaranteed due to the quadratic nature 
of the subproblems. Overrelaxation is a well-known [33] 
strategy aimed at limiting the zigzaging effect that will be 
described in IV-C2. The resulting algorithm, which will be 
referred to as alternated conjugate gradient for CSI method, 
is presented in Table II. Despite nested iterations, it appears 
that, due to a better use of conjugacy, this algorithm is faster 
than the standard CSI method. 

As indicated in Table II, we propose to perform conjugate 
gradient steps until a sufficient decrease of the gradient is ob- 
tained. The computation cost would be prohibitive if iterations 
were performed until full convergence (i.e., until l|VFl| = 0). 
We then suggest to stop the conjugate gradient algorithm once 
the initial gradient norm has been reduced by a factor of T 
i.e., to use a truncated version of conjugate gradient algorithm. 



TABLE II 

Alternated conjugate gradient for CSI algorithm 



Typical values of T and overrelaxation coefficients € [1,2) 
are T = 10 or 20 and 6^ = 0^ = 1.5. 

It should also be underlined that the minimization with 
respect to w\. . . wm can be performed in parallel, since, 
according to (11), neither A nor hi depend on the current 
values of Wj, j ^ i. 

2) Simultaneous update algorithm: Both CSI and alter- 
nated conjugate gradient for CSI algorithms rely on alternated 
updates of W and x. However, such schemes are often 
reported to be inefficient in classical numerical analysis text- 
books such as [33] (see Fig. 10.5.1 therein). More precisely, 
successive minimization along coordinate directions suffers 
from the so-called zigzaging phenomenon: if the minimizer is 
located within a narrow valley, many small steps are required 
to come close to it. 

Instead of alternated updates, minimization along conju- 
gate directions is usually recommended. In the microwave 
tomography context, we are thus driven to apply a nonlinear 
conjugate gradient algorithm to the complete set (a;,W) 
of unknown quantities. Nonhnear conjugate gradient is an 
extension of the Unear conjugate gradient whose form for 
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complex-valued unknown vector is detailed in Appendix B, 
Table V. The resulting scheme exactly corresponds to Ta- 
ble V, with V = (a;,W) (after arrangement in a column- 
vector form) and / = F. Obviously, we have V„/ ~ 
(Va;-F, VwjF, VwmP)- ^ Isss trivial property is that 
the optimal step length argmin^ f{v + ap) can be computed 
analytically. It is actually equivalent to finding the zeros of a 
third order polynomial. The expression of this polynomial is 
given in Appendix C. 

However, the resulting simultaneous optimization scheme 
has a drawback compared to alternated optimization schemes: 
X and W correspond to different physical quantities, which 
are not measured in the same unit system, and the efficiency 
of the simultaneous optimization scheme happens to depend 
significantly on the chosen units, i.e., on the respective scale 
of X and W. Rescaling the variables may result in faster 
convergence, opur observations indicate that the range for ap- 
propriate factors varies widely from experiment to experiment. 
Therefore, our approach is to devise a technique that makes the 
simultaneous optimization scheme insensitive to the relative 
scales of x and W. An elegant way to get rid of scaling 
problems is to use a suitably preconditioned version of conju- 
gate gradient algorithm. The general idea of preconditioning 
is to resort to the basic conjugate gradient algorithm after a 
linear invertible change of variables v — St)', in order to solve 
an equivalent but better-conditioned problem [29]. Table III 
displays the preconditioned conjugate gradient algorithm in the 
case of a complex-valued unknown vector. The only difference 
with Table V is the presence of the preconditioner P = SS^ 
which is a positive Hermitian matrix by construction. 



{Minimization of a non quadratic function / with respect 

to v} 

Initialize v 

repeat 

if = then 
P < — Pg 
else 

/3 ^ SR ((g - g<'i^)tPg) /{g'^yPg"'^ 
p < — Pg + f3p 
end if 

a <— argmin^j f{v + ap) 
V <— V + ap 

9"'" - g 

until Sufficient decrease of H^H^ 

TABLE III 

Preconditioned conjugate gradient algorithm in the case of a 
complex- valued unknown vector 

Here, we use a classical preconditioning strategy in which 



P is a diagonal matrix formed with the inverse of the diagonal 
entries of the Hessian of F. Thus, the algorithm becomes 
independent of a change of units. The proof is rather straight- 
forward and is omitted here. 

The main cost of such an algorithm, compared to the 
unpreconditioned conjugate gradient method, comes from the 
computation of P which must be done at each iteration. This 
computation required a matrix-vector multiphcation involving 
a n X n matrix (as detailed in Appendix D). In the results 
section we will show that despite this additional cost, pre- 
conditioned conjugate gradient performs well compared to 
alternated conjugate gradient for CSI. 

Remark 2: While the conjugate gradient algorithm for 
a complex- valued unknown vector can always be put into the 
form of Table V, Table III does not give the general form of 
preconditioned conjugate gradient in the complex case. The 
reason is that v = Sv' is not the general expression for a Un- 
ear invertible change of variables in the complex case, which 
would rather read [^{v)\ = S[SR(w')*, 

where represents the transpose operation. Therefore, 
the complex-valued vectors should be replaced by their 
equivalent real-valued representation to obtain the general 
form of preconditioned conjugate gradient. Nonetheless, 
preconditioning by a diagonal scaling matrix can always be 
implemented using complex- valued quantities, so we restrict 
ourselves to this type of implementation. 

V. Results 

We now present some experimental results. First, the pit- 
falls of the CSI method and the behavior of the proposed 
techniques are illustrated on synthetic data examples. Second, 
the global performance of the three studied algorithms (CSI, 
alternated conjugate gradient for CSI and preconditioned 
conjugate gradient methods) are compared using experimental 
data published in [34]. 

Synthetic data were generated with a 2D simulator solving 
the electric field integral equation using a pulse basis functions 
and point-matching scheme. The domain of interest D was of 
one squared wavelength (Ag x Aq) of size. We used M = N 
with emitters and receivers equally spaced on a circle with 
radius Aq /\/2 centered on D. Unless otherwise specified, M = 
32 and n = 32^ = 1024. White Gaussian noise was added to 
each set of simulated data in order to get a signal-to-noise 
ratio of 20 dB. 

Performance of the algorithms greatly varies according to 
the shape and magnitude of the object under test. Tests were 
thus performed with three different objects. The first one, 
shown in Fig. 2, is made of two concentric square cylinders 
having contrasts of 1 — jO.5, for the outer one, and 0.5 — j, 
for the inner one. The second object under test has the same 
shape but all the contrast values are multiplied by a factor 3. 
These objects will be referred to as the small square cylinder 
object and large square cylinder object, respectively. The third 
object under test, shown in Fig. 3, is made of a single circular 
cy finder with a radius of Ao/2. The contrast of the cy Under is 
constant and purely real with a value of 2. It will be referred 
to as the circular cylinder object. 
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(a) (b) 

Fig. 2. (a) Real and (b) minus the imaginary parts of the contrast of the 
small square cylinders object. The large square cylinder object has the same 
shape but all contrast values are multiplied by a factor of 3. The x and y axes 
are indexed by the sample number. 




Fig. 3. Real part of the contrast of the circular cylinder object. The imaginary 
part is zero. The x and y axes are indexed by the sample number. 

Quantitative assessment of the solution quality is performed 
using a mean square error criterion defined as 

A^ = \\x-Xof/\\xof (17) 

where Xq is the actual contrast. 

We first illustrate our claims relative to the criterion charac- 
teristics and then turn to comparisons related to the optimiza- 
tion process. 

A. Criterion characteristics 

Here we give an example where a local optimization al- 
gorithm converges toward a degenerate solution. We demon- 
strated the existence of such solutions if A = Acsi and Ar = 
in III-B and Appendix A. We used the large square cylinders 
object with iV = M = 20 and n = 20^ = 400. 

Fig. 4 presents the modulus of the contrast at the solution 
while Fig. 5 gives the amplitude of the total field in D for i = 1 
and 5. As predicted in Appendix A, the field vanishes for the 
pixels k such that \xk\ — » oo (for reference, we had \\Ef\\ = 
0.37 V/m in the middle of D for each illuminations). This is 
also true for all other illuminations. It should be underlined 
that, in the same conditions, the solution obtained with the 
small square cylinder object and circular cylinder object are 
not degenerate. 

According to these results, we suggested, in Subsection 
IV-A, to replace Acsi by a hand-tuned value of A. In Fig. 6 we 
illustrate the effect of the weight factor on both mean square 
error and computation time. The small square cylinder object 
and the alternated conjugate gradient for CSI method were 
used. 

We remark that, for increasing values of A, the computation 
time rises and quickly becomes prohibitive. On the other hand. 



x10" 




Fig. 4. Modulus of x minimizing F with A = Acs: and Ar = 0. Large 
square cylinder object. The x and y axes are indexed by the sample number. 




y y 

(a) (b) 

Fig. 5. Magnitude of total E-field (in V/m) for the degenerate solution 
presented in Fig. 4. Illuminations (a) 1 and (b) 5. The field vanishes for all 
pixels k such that \xk\ -* oo. The x and y axes are indexed by the sample 
number. 

we observe that, for any value of A above a certain thresh- 
old, the solution quaUty remains approximatively unchanged. 
Moreover, for a given range of A, both solution quality and 
computation time are acceptable. Within this range, a trade- 
off can be achieved between computation time and solution 
quality. 

Finally, we illustrate our assertions of Subsection IV-B 
about regularization. We used the small square cylinder object 
with the alternated conjugate gradient for CSI method. Fig. 7 
(a) presents the solution at convergence (only the real part 
of the contrast is displayed for the sake of clarity) when an 
unregularized criterion is used (Ai = 0). We clearly see that 
the solution is degenerate. In Fig. 7 (b) the same criterion 
was used but the algorithm was stopped before convergence. 
Finally, Fig. 7 (c) presents the solution obtained at convergence 
with a regularized criterion (A^ = 0.001). As expected, these 
two solutions are quite similar. 

B. Optimization process 

We now give examples comparing the CSI optimization 
scheme to the proposed methods. 

In Subsection III-B2 we stated without proof that the 
approximations on which standard CSI relies prevent the 
convergence toward a local minimizer of F. This point is 
illustrated in Fig. 8, which depicts the evolution of norm 
of the gradient as a function of the number of iterations. 
With the CSI approximations [23] (solid lines) the gradient 
norm goes toward a nonzero value, thereby indicating that 
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A 

Fig. 6. Effect of the weight factor X on the solution quality (mean square 
error) and on the computation time. The small square cylinder object was 
used with the alternated conjugate gradient for CSI method. 




(a) 



(b) 
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Fig. 8. Evolution of norm of the gradient with respect to x for the CSI method 
when (-) a gradient approximation is used to calculate update directions and 
when (- -) no approximation is used. Remark: the plotted quantity is the exact 
value of the norm of the gradient, not its approximation. 
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Fig. 9. Evolution of F as a function of time for CSI and alternated conjugate 
gradient for CSI (ACG) schemes: (a) small square cylinder object, (b) circular 
cylinder object. The same criterion, with A = 0.01 and Ar = 0.001, was used 
for both methods. 



(c) 

Fig. 7. Small square cylinder object, real part of the reconstruct contrast: 
(a) Unregularized criterion, optimization stopped at convergence, (b) unreg- 
ularized criterion, optimization stopped before convergence, (c) regularized 
criterion, optimization stopped at convergence. The x and y axes are indexed 
by the sample number 



the convergence point is not a local minimizer. When no 
approximations are made (dashed lines), the gradient norm 
decreases toward zero as expected. 

Another comment in Subsection III-B2 was related to the 
non-optimal exploitation of the conjugate gradient algorithm 
in the standard CSI optimization scheme. Here we validate that 
the alternated conjugate gradient for CSI scheme is actually 
more efficient than the original CSI: both methods were tested 
using the same criterion. Parameters A and Ar were set by hand 
to 0.01 and 0.001, respectively. Tests were performed with 
the small square cylinder object and on the circular cylinder 
object. 

Figs. 9 (a) and (b) present the evolution of criterion F 
as a function of time for the small square cylinder object 
and the circular cylinder object, respectively. In both cases, 
the alternated conjugate gradient for CSI algorithm is faster, 
suggesting that a better use of conjugacy pays off. Experience 
shows that the gain in computation time can be as high as 
20% depending on the object under test and on the chosen 



stopping rule. Obviously, it cannot be proved that the al- 
ternated conjugate gradient for CSI scheme is always faster 
than CSI. Nevertheless, alternated conjugate gradient for CSI 
outperformed CSI in all test cases. 

In Subsection IV-C2 we proposed algorithms performing 
simultaneous updates of the unknowns. We illustrate how these 
simultaneous conjugate gradient and preconditioned conjugate 
gradient algorithms behave and we compare them to the 
alternated conjugate gradient for CSI method. 

In Fig. 10, the evolution of the criterion is presented for un- 
preconditioned conjugate gradient, preconditioned conjugate 
gradient and alternated conjugate gradient for CSI methods. 
Two different scales were used. The results in Fig. 10 (a) were 
obtained with the small square cylinder object while those in 
Fig. 10 (b) were obtained with the circular cylinder object. 
Scaling 2 differs from scaling 1 by the fact that a factor of 
one tenth was applied to the currents. 

The scaling sensitivity of the unpreconditioned conjugate 
gradient algorithm appears clearly in the results. The conver- 
gence speed of the algorithm exhibits large variations when 
units are changed. This is not the case for the other two 
methods. Moreover, according to our experience, the precon- 
ditioned conjugate gradient method always provides results at 
least as good as those produced by the unpreconditioned con- 
jugate gradient technique, and should therefore be preferred. 

However, comparison of simultaneous preconditioned con- 
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Fig. 10. Evolution of F as a function of time for: alternated conjugate 
gradient for CSI (ACG) algorithm with scaling 1, preconditioned conjugate 
gradient (PCG) algorithm with scaling 1 , unpreconditioned conjugate gradient 
(CG) algorithm with scaling 1 and unpreconditioned conjugate gradient 
algorithm with scaling 2. For preconditioned conjugate gradient and alternated 
conjugate gradient for CSI algorithms, the results are identical for scalings 1 
and 2. (a) small square cylinder object, (b) circular cylinder object 




(a) 



(b) 



Fig. 11. Real parts of the contrasts of objects under test tested in [34]. (a) 
One cylinder object (b) two cylinder object. Both objects under test have a 
purely real contrast. 



jugate gradient and block-component optimization is rather 
inconclusive, as the nature of the object under test seems 
to have a significant impact. Indeed, while the simultaneous 
scheme is a little bit faster for the small square cylinder 
object, its convergence speed is not even competitive for the 
circular cylinder object. Those variations prevent us from 
systematically favoring one type of algorithm over the other. 
More details are given on this subject in the next subsection. 



C. Experimental data 

We now compare the studied algorithms using the experi- 
mental data published in [34] (the 3 GHz dataset was used). In 
this paper, a quasi 2-D setup is used to perform measurements 
over two different objects under test, presented in Fig. 1 1 (a) 
and (b). They will be referred to as the one cylinder and two 
cylinder cases, respectively. Their contrasts are purely real. 
For more detail on the setup, see [34]. 

In our tests, parameter A was set as follows: For CSI, we 
always used Aqsi as defined in (7). For alternated conjugate 
gradient for CSI and preconditioned conjugate gradient, two 
values were used: the final value of Acsi (test 1); a heuristic 
value offering a good trade-off between solution quality and 
computation time (test 2). Finally, we used the same regular- 
ization factor Ai for all tests and all three methods. 

Fig. 12 presents the evolution of A^, as a function of 
time (the evolution of F cannot be used here for comparison 
purpose since the criteria are not the same for aU algorithms). 
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Fig. 12. Evolution of Ax in function of time for CSI algorithm, alternated 
conjugate gradient for CSI (ACG) method and preconditioned conjugate 
gradient (PCG) algorithm. A = Acsi for the CSI method. For alternated 
conjugate gradient for CSI and preconditioned conjugate gradient methods: 
(a) A equal the final value of Acsi , (b) A set heuristicaUy. Data from [34]. 



Figs. 12 (a) and 12 (b) present the results for test 1 and 2, 

respectively. 

The results of the first test show that alternated conju- 
gate gradient for CSI and preconditioned conjugate gradient 
provide better solutions than CSI. This seems to contradict 
the fact that the values of A were selected so as to obtain 
identical criterion values at the solution point. Actually, this 
difference can be attributed to the fact that the CSI algorithm 
does not converge toward a local minimum of the criterion, as 
illustrated in Subsection V-B. 

Moreover, the alternated conjugate gradient for CSI method 
is faster than CSI in the first test. This once again suggests that 
the better use of conjugacy made in the alternated conjugate 
gradient for CSI method pays off. 

The second test also reveals that the hand tuning of A yields 
a significant increase of the speed of alternated conjugate 
gradient for CSI and preconditioned conjugate gradient al- 
gorithms without decreasing the solution quahty. Indeed, for 
both algorithms, convergence is about 3 time faster when A is 
set by hand while the mean square errors are almost the same. 

Finally, preconditioned conjugate gradient is from 5 to 
10 times faster than alternated conjugate gradient for CSI 
according to the chosen stopping rule and the value of A. 
Further research should be conducted in order to explain such 
large variations in convergence speed, but a first analysis 
suggests that the preconditioned conjugate gradient algorithm 
is particularly efficient for "easy" object under test, i.e., for 
small object under test and/or object under tests with a low 
contrast. 

We performed the same two tests with the two cylinder 
object under test using the same parameters than for the one 
cylinder case. Results were quite similar and are not presented 
here. They nevertheless confirm, as stated in Subsection IV-A, 
that a set of parameters A and Ar which is efficient for a 
given contrast, will remain efficient for a whole set of similar 
contrasts. This then confirms the possibihty of choosing the 
value of A and Ar by performing a training step using known 
contrasts. 

VI. Conclusion 

Both the criterion form and the optimization scheme of the 
CSI method were analyzed. We estabUshed that the weight 
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{Minimization of a quadratic function f{v) = v'^Av — 
23i{b^v) + c with respect to v, where A is a Hermitian 
matrix, v and b complex-valued vectors and c a real- valued 
constant} 
Initialize v 

g^Av-b {i.e., g = V.//2} 

k^O 

repeat 

p=\\9r 

if fc = then 

P< 9 

else 

p< g + Pp 

end if 
h Ap 

a <— p/p^h 
V ^ V + ap 
g ^ g + ah 

II 2 

until Sufficient decrease of p= \\g\\ 



factor prescribed in the CSl method was not suitable and could 
lead to a degenerate solution. We also underlined that the CSl 
optimization scheme does not take advantage of conjugacy in 
an optimal way. 

We then proposed two new methods, both making use of the 
same criterion which is similar to the one used in CSl. How- 
ever, the weight factor is set heuristically. We put forward that 
solution quality and computation time were directly related to 
the value of this factor. The role of the regularization term was 
also investigated and we chose to use a regularization penalty 
term in our criterion. 

Alternated conjugate gradient for CSl, despite nested it- 
erative algorithms, appears to be faster than CSl. We also 
proposed a preconditioned conjugate gradient algorithm for 
simultaneous updates of the unknowns. The latter scheme is 
insensitive to the relative scale of the data. Our results indicate 
that preconditioned conjugate gradient is sometimes faster and 
sometimes slower than block component approaches. Indeed, 
their respective behavior strongly depends on the nature of 
object under test. 

A more precise study of the behavior of block-component 
and simultaneous update approaches should be undertaken in 
order to better evaluate their relative performance. We could 
then expect to determine which algorithm should be favored 
according to the experimental conditions. 

Obviously, an unsupervised method for tuning the weight 
factor could be of great interest. In some cases the hand- 
tuning of A from training step is a good choice but it may 
be sometimes difficult to apply. 

Appendix 

A. Proof of the degeneracy of minimizers of Fcsi = Fi + 

^081^2 

Let us first assume that there exist a set W of current 
distributions Wi such that Fx cancels, i.e., 

Vi = GoWi (18) 

and a point of the domain D where the total field E^+GcWi 
simultaneously cancels for all i, i.e., 

eliE° + GcW,) = (19) 

for some k G {1, . . . , n} where ei~ is the kth basis vector. 
Then we have 

F{x, W)^ Fi = (20) 

provided that Ar = and Xk — > oo. Indeed, in such conditions, 
F2 does not depend on Xk, according to (6c) and (19), while 
the denominator of Acsi does, in such a way that Aqsi 
decreases to zero when Xk takes arbitrarily large values. 

Finally, let us justify the existence of such a set of current 
distributions. For fixed values of i and k, constraints (18) 
and (19) together form a hnear system of N + 1 equations, 
depending of the n unknowns formed by Wi. Since n ^ N 
(typically, n « N^), this system is likely to be undeterminate, 
i.e., it has infinitely many solutions. This actually holds for 
any values of i and k, hence the proof. 



TABLE rv 

Linear conjugate gradient algorithm in the case of a 
complex- valued unknown vector 



B. Conjugate gradient algorithms for complex unknown val- 
ues 

We present here the linear and nonUnear Polak-Ribiere- 
Polyak conjugate gradient algorithms in Table IV and Table 
V, respectively . In all cases, g = W^f denotes the gradient 
of a criterion / with respect to an unknown vector v, p 
the descent direction induced by conjugate gradient, (3 the 
conjugacy factor that allows to compute the successive con- 
jugate descent directions —g + j3p, a the optimal step length 
argmin^ f{x + ap*'), and k the current iteration number. 

Tables IV and V address the case of complex-valued un- 
known vectors, which corresponds to the relevant situation 
here, while classical numerical optimization textbooks such 
as [29] are restricted to real-valued vectors. Indeed, it can be 
shown that our complex-valued implementations of conjugate 
gradient produce the same iterations as the reference real- 
valued schemes, where all complex-valued quantities would 
be replaced by an equivalent real-valued representation. For 
instance, x should be substituted by a real vector of length 2n 
such as [^{x)\ 

C. Optimal step length calculation for simultaneous optimiza- 
tion schemes 

For the simultaneous optimization schemes, the optimal step 
length a is the minimizer of F{x -\- apx, [wi -\- api]i). Ac- 
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{Minimization of a non quadratic function / with respect 
to v} 

Initialize v 
repeat 

if fc = then 

P< 9 

else 

p< g + pp 

end if 

a ^ argmiiiQ, f{v + ap) 
V ^ V + ap 

gold ^ g 

II 2 

until Sufficient decrease of \\g\\ 

TABLE V 

Nonlinear Polak-Ribiere-Polyak conjugate gradient 
algorithm in the case of a complex- valued unknown vector 



cording to (6a), (6b), (6c) and (10), F is a quartic polynomial 
function of a with real coefficients 

F{a) = Ro + aRi + a^i?2 + a^Ra + a^Ri (21) 

where 

i?o=F(x,W) 

i?2 =^ (||GoP.f + A (2^{rls^i) + |k2.f )) 

i 

+ K\\T>p^f 
Ra =2X^?li{qls2i) 

R4=xj2\\s2^r 

i 

and 

rii = yi- GoWi 

r2. = X(£;0 + GcWi) - Wi 

Tr = Da; 

q2i = dia,g{p^}{E° + G^Wi) + (XGc - I)pi 
S2i = dia.g{px}G cPi . 

By necessary condition, the minimizer a cancels the derivative 

of F{a) 

F'{a) =Ri + 2aR2 + Sa^Rs + ia^R4 (22) 



which is a cubic polynomial function with real coefficients. 
Thus, a can be calculated exactly: it suffices to determine the 
solutions of F'{a) = (for instance, by Cardano's method), 
and to select the one that minimizes F{a) among the real 
solutions. 

D. Expression of preconditioner P 

We detail the expression of the proposed preconditioner P. 
In the preconditioned conjugate gradient method the unknown 
vector is formed by (a;,W). The proposed preconditioner is 
then the inverse of the diagonal matrix formed by the diagonal 
entries of Q followed by M instances of the diagonal entries 
of A 

P = diag{diag{Q}*, diag{A}*, . . . , diag{A}*}-i . (23) 

According to (12) and (15), all these elements can be com- 
puted off-hne or at a relatively low cost (term by term vector 
multiplications) except for diag{(XGc — I)^(XGc — I)} that 

can be expressed as 

diag{(XGe-I)^(XGe-I)} 

= IGcHa^p - 23?(diag{diag{Gc}}a;) - 1 

where |Gcp and jeep represent the matrix and the vector 
formed with the square modulus of the entries of G^ and 
X, respectively. Therefore, evaluation of this term essentially 
requires one multiplication by a n x n full matrix at each 
iteration. 
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